Note: This is a paper I wrote as part of a philosophy of science graduate seminar. At the moment, I consider it a work in progress and welcome constructive feedback.
1. Introduction
This paper closely examines the nascent field of artificial intelligence (AI) safety as it is currently evolving into a paradigm of its own in computer science. This subfield of AI is concerned with one central question: how to make machine learning models have goals aligned with human goals. This puzzle is widely known as the alignment problem (Bostrom 2016) or the problem of control (Russell 2019). The paper is centered around the proposal that one requirement to solve the alignment problem is to reach the stage of normal science in the new paradigm of AI safety.
To support this proposal, I will offer an account of how the research efforts to transform the field into a paradigm accurately reflect Kuhn’s model of the structure of science presented in the Structure of Scientific Revolutions and what this specifically predicts for AI safety. I will first describe the characteristics of the field at its pre-paradigmatic state and dive into the nature of the alignment problem and how it should be understood in relation to questions about how science changes. Then, I will suggest the desiderata for AI safety to qualify as a paradigmatic field according to Kuhn’s epistemology of science. Further, the development of AI safety is tightly associated with the discussion about existential risks from technological progress. I will use Kuhn’s approach to the history of science to discuss the role of philosophy of science in the transformation of the field into a paradigm to minimize existential risks. My overarching claim is that history and philosophy of science help provide a more comprehensive understanding of risks from advanced AI models and implementing Kuhn’s view on the development of AI safety is crucial for constructing an epistemological account of such risks and challenges.
2. AI safety: a field at the pre-paradigmatic state
The opportunity to observe an emerging scientific field transform into a paradigm is accompanied by intrinsic uncertainty and metascientific deliberation. Uncertainty, because the field lacks the maturity required to be studied as a discipline of its own. Maturity here entails unity, clarity, and concreteness in the research agendas of scientists (Hibbert 2016). In Kuhn’s terms, at this stage, researchers have no consensus on the body of concepts, phenomena, and techniques to use when working. They also do not follow the same rules or standards for their scientific practice, simply because no such rules or standards have been established. Establishing them is no easy task; “the road to a firm research consensus is extraordinarily arduous” (Kuhn 2012, 15). Metascientific deliberation is necessary to account for the uniqueness of the generation of novel scientific knowledge and practice as well as the difficulties that might be encountered on the road to normal science and puzzle-solving (Kuhn 2012).
Diagnosing the difficulties of the pre-paradigmatic phase will help illustrate how Kuhn’s model applies to AI safety. The first difficulty concerns narrowing down the research directions space. Without a paradigm, or at least one obvious paradigm candidate, all hypotheses, research proposals, and agendas seem equally promising and worthy of investigation. Moreover, we observe a disunity of frameworks that concerns the concepts, theories, agendas, practices, methodological tools, and other criteria for what qualifies as having high explanatory power. As of 2022, there are three main research directions in AI safety, but it is highly controversial whether any of them would qualify as a paradigm independently. These research directions are 1) Assistance Games, motivated in (Russell 2019) and developed in (Fickinger et al. 2020), 2) Agent Foundations (Yudkowsky 2008), and 3) Prosaic Alignment including three proposals: HCH (Christiano 2018), ELK (Christiano 2022), and formalizing heuristic arguments (Christiano, Neyman, and Xu 2022). This categorization is not exhaustive although it provides a sense of the alignment research landscape and the priorities of prominent researchers. Different teams and laboratories are nevertheless optimizing for different agendas and goals as there is no agreement on which agenda is most likely to succeed.
A second, follow-up difficulty is that most researchers tend to be fundamentally confused about their field. This feature of the pre-paradigmatic stage is explicitly expressed and acknowledged by alignment researchers working with different agendas. For example, Dr. John Wentworth indicates that the first step of his research plan is to “sort out our fundamental confusions about agency” (Wentworth 2021) admitting that experts in the field share a “fundamental confusion” and there is no explicit consensus on the nature of the subject matter and the best ways to approach it.
One last but crucial difficulty for the pre-paradigmatic AI safety is the lack of a published corpus of work that goes together with a highly organized discipline. As Kuhn notes: “in the sciences, the formation of specialized journals, the foundation of specialists’ societies, and the claim for a special place in the curriculum have usually been associated with a group’s first reception of a single paradigm” (Kuhn 2012, 19). In other words, the institutional norms and patterns are yet to be established. This includes having a rigid definition of the group of AI safety scientists, one or more AI safety peer-reviewed journals, regular conferences, university courses[1] and professorships.
There are many signs that these difficulties will be overcome in the near-term future. For example, the fact that many students and researchers have shifted their academic interests towards machine learning alignment is a way to measure the growth of the field into a scientific community. According to an estimation, there are currently approximately 400 people worldwide working directly on AI safety issues, including those that do not do strictly scientific work but are adjacent to the developing paradigm of research.[2] Having these researchers pursue the existing directions, conduct empirical work and experimentation to falsify hypotheses, and share their findings indicates that the field is at least receiving adequate attention to eventually become its own paradigm.
Before diving into the nature of the alignment problem, it is valuable to address why it is so critical to consider the development of the AI safety paradigm in particular, among all fields that might be under transformation at this point. The answer is that AI safety entering its normal science phase will have an extraordinary impact on society at large (Taylor et al. 2016). This is primarily because whether we are able to control extremely powerful machines or transformative AI systems (TAI) might be the main factor determining the long-term future of humanity. To understand the significance of a fully formed field of AI safety, we must take into account that the stakes of solving the alignment problem are particularly high as machine learning is becoming more and more a general purpose technology, i.e., “a single generic technology, recognizable as such over its whole lifetime, that initially has much scope for improvement and eventually comes to be widely used, to have many uses, and to have many spillover effects” (Crafts 2021). The risks that accompany the potential arrival of a highly powerful AI fall into two general categories: existential risks and suffering risks (Bostrom 2016). AI being an existential risk means that highly powerful future models will be unaligned with human goals and values and consequently, they could cause the species to go extinct (Ibid). The hypothesis concerning suffering risks suggests that such unaligned models could be the cause of endless suffering either for humans or other sentient beings (Tomasik 2011). All these should be carefully considered when examining under what conditions the field of AI safety is being transformed into a paradigm of its own.
3. The nature of the alignment problem
Examining the nature of the alignment problem yields useful observations for the metascientific study of this emerging field and for placing it into the context of the discourse on risks and technological progress. To analyze the alignment problem from a philosophical standpoint, it is helpful to consider whether it shares any similarities with other problems in the history of science. To explicate the metaphysics of alignment, I draw an analogy between AI safety and the history of chemistry. In particular, the alchemists were intellectually curious, “proto-scientists” (Vančik 2021), deeply confused about their methods and how likely they are to succeed. They all, however, shared a common aim summarized in this threefold: to find the Stone of Knowledge (also called “The Philosophers’ Stone”), to discover the medium of Eternal Youth and Health, and to discover the transmutation of metals (Linden 2003). Their “science” had the shape of a pre-paradigmatic field that would eventually transform into the natural science of chemistry. Importantly, their agenda ceased to be grounded upon mystical investigations as the field began to mature. As Kuhn’s model predicts, those who remained attached to the pre-paradigmatic mysticism did not get to partake in the newly formed normal science; “the transfer of allegiance from paradigm to paradigm is a conversion experience that cannot be forced”, Kuhn remarks (Kuhn 2012, 150) suggesting that a transformation takes place facilitating the change of practice from pre-scientific to scientific.
The claim here is not that alignment maintains in any sense the mystical substrate of alchemy. It shares the high uncertainty combined with attempts to work at the experimental and observational level that cannot be supported as in physical sciences. Furthermore, it shares the intention to find something that does not yet exist with the expectation that when it does, it will make the human world substantially qualitatively different than it was prior to that invention. This belief in a technological product that will change the flow of history in extraordinary and perhaps unpredictable ways is worth remarking. It also allows to deepen the chemistry-alignment analogy: the aim of alignment work is to make highly powerful systems follow instructions by human designers. But, at the same time, AI teams seem to be in a “technology arms race” (Armstrong, Bostrom, and Shulman 2016) to be the first to build the most powerful machine that has ever existed and could have transformative impact on humanity (Gruetzemacher and Whittlestone 2022).
It remains valuable to ask about what can history of science teach us when studying the generation of a new scientific field. Continuing with the analogy, it would be useful for the progress of alignment research to be able to trace what exactly happened when alchemy became chemistry. Several suggestions might apply, namely the articulation of one or more equations, or the discovery and analysis of a substance like phlogiston. In that sense, researchers would need to find alignment’s phlogiston and that would bring them closer to discovering alignment’s oxygen. Of course, scientific change is not as straightforward. While ideally, we would want to uncover what the successful actors in history did and apply it to a paradigm in the making, in practice, this does not hold. This would become possible, however, with a generalized logic of scientific discovery that would describe in high detail how science operates.
4. Questions about scientific change
Another way to think about the generation of this new field is to consider whether the transformation from the pre-paradigmatic to the paradigmatic stage could be accelerated. This question simply translates into whether progress can become faster and if yes, under what conditions. It seems that throughout the history of science, faster progress has always entailed quantitatively more empirical work in the field. Because of the rapid development of AI, one suggestion is to use AI models to advance alignment research. Using AI models could increase research productivity in alignment even if they cannot at present generate novel insights or properly theorize about scientific work in a reliable way. Such AI systems are usually large language models and for example, they can review bibliographical material, compose summaries, and explain concepts with less jargon, among other tasks (Wei et al. 2022).
In Kuhn’s framework, progress in science is discontinuous. This suggests that researchers working within the established (future) paradigm of alignment theory will see the world radically differently compared to researchers working in the paradigm of “good old-fashioned AI” (Haugeland 1989). It is worth noticing that there were a few researchers in the old paradigm thinking that highly powerful machines will pose a problem of control, notably, I. J. Good and V. Vinge. Whether they could have predicted, however, the occurrence of the deep learning revolution (Sejnowski 2018) and what it would imply for alignment is rather debatable.
Contrary to the traditional, logical empiricist view, it is not the accumulation of new knowledge that will bring the new paradigm. Researchers will need to recognize that the anomalies and confusions of the field at the pre-paradigmatic state require different conceptual and technical tools to be resolved. However, it is generally possible to understand the problematic aspects of the field, in retrospect, i.e., once the paradigm is established.
This brings us to a commonly misunderstood concept in Kuhn’s epistemology of science, i.e., incommensurability. Incommensurability invites a twofold discussion about the way terms are used when placed in different theoretical contexts and about changes in worldview (Kuhn 2012; 1990). A classic example is the term “mass”; while the term appears both in the paradigm of classical mechanics and relativity theory, it connotes different meanings depending on the paradigm. But while some concepts can be translated from an older paradigm to the vocabulary of the new one, it is generally impossible to find a one-to-one correspondence between the terms of the old paradigm and the terms of the new one. This semantic dependence on the theory implies that there is no common measure although a comparison remains possible (Kuhn 1990). In that sense, as AI safety continues to evolve, we may find that concepts such as agency or intelligence are incommensurable to what was used to describe artificially intelligent systems in the old AI paradigm. This will not suggest that there is no way to neutrally compare e.g., the concept of intelligence without anchoring it into a theoretical apparatus. It is also possible that some terms preserve their meanings despite the paradigm change (Kuhn 1990, 36).
The change of scientific worldview is extensively discussed by Kuhn in the Structure and by Hanson in Patterns of Discovery with the proposal of theory-ladenness. Kuhn parallelizes the change of worldview to how we suddenly see different shapes in gestalt figures (Kuhn 2012, 114). The verb “see” here implies the automaticity of this change of perception, i.e., the scientist cannot control the shift of attention that has occurred. A typical example of this in the history of science is astronomy. Notably, while western Europeans started discovering planets the half century after the introduction of the Copernican paradigm, the Chinese were long aware of many more stars in the sky. This is because their cosmological paradigm offered a different model of celestial change.
Kuhn explains how he originally conceived the idea of belonging to a different paradigm. He describes sitting at his desk with Aristotle’s Physics open in front of him when “suddenly the fragments in my head sorted themselves out in a new way, and fell into place together […] for all at once Aristotle seemed a very good physicist indeed, but of a sort I’d never dreamed possible” (Kuhn 2014). From that point onwards, Kuhn would argue that scientists do not try to do the job of their predecessors simply in a better way than they did (Reisch 2016); they see a different world, they attend to different phenomena and features of reality. This leads to an asymmetry in that they operate within a completely unique context and thus construct different world models and provide different explanations.
On a similar tone but with a cognitive focus, Hanson describes the phenomenon of theory-ladenness to illustrate that scientific understanding depends on perceptual input and its processing. Hanson famously quotes Duhem’s example:
Enter a laboratory; approach the table crowded with an assortment of apparatus, an electric cell, silk-covered copper wire, small cups of mercury, spools, a mirror mounted on an iron bar; the experimenter is inserting into small openings the metal ends of ebony-headed pins; the iron oscillates, and the mirror attached to it throws a luminous band upon a celluloid scale; the forward-backward motion of this spot enables the physicist to observe the minute oscillations of the iron bar. But ask him what he is doing. Will he answer ‘I am studying the oscillations of an iron bar which carries a mirror’? No, he will say that he is measuring the electric resistance of the spools. If you are astonished, if you ask him what his words mean, what relation they have with the phenomena he has been observing and which you have noted at the same time as he, he will answer that your question requires a long explanation and that you should take a course in electricity (Hanson 1958, 16-17).
Both from Kuhn and Hanson, it becomes clear that participating in a paradigm means to learn to attend to certain phenomena and features of the parts of the world that are under investigation. At the same time, this implies learning to block out everything else; part of belonging to a paradigm means that scientists agree on what to attend to and what to block out.
Throughout his writings, Kuhn challenges the idea of scientific progress, for example by saying: “We must explain why science — our surest example of sound knowledge — progresses as it does, and we must first find out how in fact it does progress” (Kuhn 1970, 20). Science does not “get better” It is plausible to argue that Kuhn would be skeptical of a potential technological singularity (Sandberg 2013) or the completion of science. In Kuhn’s model, science does not move towards absolute knowledge or complete science. Since the scientific process is discontinuous, it is not meaningful to argue that science aims at “objective” truth or simply truth as correspondence to reality (Rorty 2003). Paradigms are incommensurable, therefore there is no neutral way to talk about how one paradigm finds “more truth” than the other. One might assume that Kuhn purposefully talks very little about the notion of truth in the Structure (Bird 2012). He returns to the topic in the Postscript to the second edition mostly to address the various criticisms against the Structure about relativism (Kuhn 2012, 204). While assessing Kuhn’s understanding of truth is out of the scope of this paper, it is necessary to note that Kuhn seems to be at least a neutralist stance about truth (Bird 2007). This has, in many instances made him a major ally of antirealist views in science while it is not implausible to find ways in which the Kuhnian model is compatible with scientific realism.
5. Desiderata for the new paradigm
The fact that people can form coalitional groups and agree to conform to certain rules and norms does not make their group nor their work a scientific paradigm. It is thus essential to sketch out what we can expect AI safety to look like as a paradigm, based on Kuhn’s model of the structure of science. This can be regarded as experimenting with a future history of AI safety. As such, it is useful to lay out a series of desiderata that AI safety will qualify for once it has entered its paradigmatic stage. First,while the term paradigm has a long history that dates back to Aristotle’s Rhetoric (Hacking 2012), paradigmsare defined in the Structure as “accepted examples of actual scientific practice—examples which include law, theory, application, and instrumentation together—[that] provide models from which spring particular coherent traditions of scientific research” (Kuhn 2012, 11). In that sense, the paradigm will shape scientific work as a whole, from education and recruiting new researchers to everyday practice.
It follows then that the function of a paradigm is both cognitive and normative (Kindi and Arabatzis 2013, 91) which means that paradigmatic AI safety will 1) prepare students for becoming professional researchers and members of a scientific community, 2) select the class of facts that “are worth determining both with more precision and in a larger variety of situations” a process characteristic of normal science also called “fact gathering” (Kuhn 2012, 25-27), 3) define the specific problems that must be solved (Ibid, 27-28), 4) offer criteria for selecting these problems (Ibid, 37), 5) guarantee the existence of stable solutions to these problems (Ibid, 28), 6) provide methods and standards of solutions, 7) aim at quantitative laws (Ibid). 8) make predictions, and 9) give satisfactory explanations for phenomena.
It is worth remarking that the paradigm offers what Kuhn calls “theoretical commitment” without which important facets of scientific activity do not exist, namely the discovery of laws. Kuhn’s examples highlight that the examination of measurements alone is never enough for finding quantitative laws. The history of science provides sufficient evidence to think that the experimental, Baconian method does apply, but only to a certain degree. Characteristically, Kuhn mentions Boyle’s Law, Coulomb’s Law, and Joule’s formula (Ibid, 28) to argue that they were discovered through the application of a particular scientific paradigm – even if it was not as explicit at the time – and not through the study of measurements. In that sense, the theoretical commitment encapsulates a set of assumptions and concepts which are a prerequisite for the discovery of laws. By analogy, we can anticipate AI safety to find and formulate laws once the theoretical commitment is clearly established and broadly accepted.
6. Metascientific projects
6.1. Conceptual clarification
In setting up a metascience of alignment, the primary question to consider is how can philosophy help the field in a concrete and straightforward way. While the possibility of this is itself contentious and pertains to an inquiry about the relationship of philosophy and science more broadly, it is arguable that philosophy can, to some extent, assist in disambiguating central concepts of AI. In the case of AI safety in particular, there are many conceptual difficulties that stem from the intrinsically perplexing nature of the study of intelligence and agency. For example, the research agenda of Agent Foundations is focused on modeling agents so that future powerful models exhibit behaviors we can predict and consequently, control. Philosophers working on the alignment problem can contribute to the conceptual clarification and modeling of agency using analytic tools that are common in philosophical reasoning.
There is one important objection to the claim in favor of the contribution of philosophical conceptual clarification. Looking at the history of science more generally, it does not seem like philosophy typically helped directly with scientific research. In other words, while some work in the philosophy of science engages with specific problems in the sciences, it is rare that it actually accelerates the progress of the field by disambiguating fundamental concepts beyond merely semantic disagreements. For that to be a true possibility, philosophers would have to work as scientists and scientists would have to cultivate a philosophical disposition.
A reply to this objection is that throughout the history of science and across various disciplines, scientists have in many instances acknowledged the necessary role of philosophical thinking in articulating the right questions and trying to explain the world rationally. Concepts are often understood as the “building blocks” of theories (Gerring 1999). In that sense, as Popper notes in the preface to the Logic of Scientific Discovery, the method of philosophy and science is a single one, and that is rational discussion (Popper 2002, xix). More specifically on the relationship of AI and philosophy, McCarthy hints at the need for conceptual clarification aided by working on philosophical questions. This is especially useful since both fields are concerned with concepts such as goals, knowledge, belief, etc. A central problem at the basis of AI concerns the conditions that will allow testing intelligent behavior for example, by applying the Turing test. Analyzing what the Turing test is about requires philosophical rigor so that to determine and interpret the criteria to pass the test. For that reason, it has been a common theme in philosophy e.g., in (Searle 1980), (Dennett 1984), and (Chalmers 1994). It overall appears that conceptual clarification is indispensable to rational discourse and the more thoroughly that it is executed, the more likely it is to yield itself conducive to paradigm transformation.
6.2. Questions about the nature of machine learning
Another project in the philosophy of science adjacent to the development of the AI safety paradigm focuses on questions about the metaphysics of machine learning and specifically, the nature of deep neural networks. This discussion is centered around whether we can emulate the functions of human minds by imitating the neural circuitry of the human brain. Until recently, this seemed unlikely to work. Steven Pinker criticized connectionism in the early 2000s arguing that
Humans don’t just loosely associate things that resemble each other, or things that tend to occur together. They have combinatorial minds that entertain propositions about what is true of what, and about who did that to whom, when and where and why. And that requires a computational architecture that is more sophisticated that the uniform tangle of neurons used in generic connectionist networks. (Pinker 2003, 79)
Pinker goes on to expose the limitations of connectionism that derive from this not sufficiently sophisticated computational architecture of neural networks. All this is to show that a complete human thought cannot be represented in a generic network that is used in machine learning. Specifically, neural networks would not be able to distinguish between kinds and individuals, have thoughts that aren’t just a summation of smaller parts (compositionality), play with logical quantification, embed one thought in another (recursion), and reason categorically (Pinker 2003).
The most recent development of machine learning proved all these claims false. Deep neural networks are surprisingly good at all the tasks Pinker mentions. In particular, large language models or transformers (Vaswani et al. 2017) and other deep learning architectures have generated impressive results such as writing like humans or making scientific discoveries, e.g., AlphaFold (Jumper et al. 2021). While there is no consensus on why these models are so successful (Ngo 2022), it is reasonable to conclude that intelligence was not so hard to find, after all. How they work seems to be itself a problem between philosophy, science, and engineering. One or more solutions to it will likely be at the foundation of the new AI safety paradigm. Explainable AI, i.e., models that are not regarded as black boxes but are rather interpretable(Li et al. 2022) would overall make safety easier to achieve.
Lastly, considerations about whether it is possible for machine learning systems to think or become highly competent at human-level tasks go back to AI researchers in the 1950s (McCarthy 2008) and remain pertinent and present even in AI textbooks such as in (Russell and Norvig 2009). Gaining clarity on how neural networks work and what accounts exactly for their success is almost equivalent to unlocking the nature and conditions of cognition.
7. Conclusion
The emergence of AI safety as a paradigm in the making, fuels the study of how new science is generated and presents an opportunity to dive into the most fundamental questions about the nature of scientific practice and the epistemology of progress in science and technology. My aim in this paper was to explore the application of Kuhn’s theory of the structure of science in the case of AI safety. I argued that the field is at its pre-paradigmatic state which motivates a series of metascientific observations. For these observations to make sense, I offered an account of the nature of the alignment problem, the central problem AI safety must tackle. In doing so, I suggested that there is an analogy between the development of AI safety and the history of chemistry. I then sketched out the requirements for the field to transform into a fully evolved paradigm according to Kuhn’s view on scientific change. As the filed is undergoing quick development, it is reasonable to speculate that it will transition to the paradigmatic phase in the near-term future. A potential way to accelerate this transformation is by investing in philosophically-minded, metascientific projects such as conceptual clarification and the theoretical study of machine learning. If targeted at the right questions, they could be conducive to transitioning to a paradigm of AI safety. The need to reach the paradigmatic state becomes more and more urgent as machine learning capabilities advance, and models exhibit intelligent behavior that could eventually pose serious risks for society at large.
References
Armstrong, Stuart, Nick Bostrom, and Carl Shulman. 2016. “Racing to the Precipice: A Model of Artificial Intelligence Development.” AI & SOCIETY 31 (2): 201–6. https://doi.org/10.1007/s00146-015-0590-y.
———. 2012. “10 Kuhn, Naturalism, and the Social Study of Science.” In Kuhn’s the Structure of Scientific Revolutions Revisited, edited by Vasō Kintē and Theodore Arabatzis, 205. Routledge.
Bostrom, Nick. 2016. Superintelligence: Paths, Dangers, Strategies. Reprint edition. Oxford, United Kingdom ; New York, NY: Oxford University Press.
Chalmers, David J. 1994. “On Implementing a Computation.” Minds and Machines 4 (4): 391–402.
Christiano, Paul, Eric Neyman, and Mark Xu. 2022. “Formalizing the Presumption of Independence.” ArXiv Preprint ArXiv:2211.06738.
Crafts, Nicholas. 2021. “Artificial Intelligence as a General-Purpose Technology: An Historical Perspective.” Oxford Review of Economic Policy 37 (3): 521–36. https://doi.org/10.1093/oxrep/grab012.
Dennett, Daniel C. 1984. “Can Machines Think?” In How We Know, edited by M. G. Shafto. Harper & Row.
Fickinger, Arnaud, Simon Zhuang, Dylan Hadfield-Menell, and Stuart Russell. 2020. “Multi-Principal Assistance Games.” ArXiv Preprint ArXiv:2007.09540.
Gerring, John. 1999. “What Makes a Concept Good? A Criterial Framework for Understanding Concept Formation in the Social Sciences.” Polity 31 (3): 357–93.
Gruetzemacher, Ross, and Jess Whittlestone. 2022. “The Transformative Potential of Artificial Intelligence.” Futures 135: 102884.
Hacking, Ian. 2012. “Introductory Essay.”
Hanson, Norwood Russell. 1958. Patterns of Discovery: An Inquiry into the Conceptual Foundations of Science. 1st edition. Cambridge: Cambridge University Press.
Haugeland, John. 1989. Artificial Intelligence: The Very Idea. Reprint edition. Cambridge, Mass: Bradford Books.
Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, and Anna Potapenko. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89.
Kindi, Vasso, and Theodore Arabatzis. 2013. Kuhn’s The Structure of Scientific Revolutions Revisited. Routledge.
Kuhn, Thomas S. 1970. “Logic of Discovery or Psychology.” Criticism and the Growth of Knowledge 4: 1.
———. 1990. “The Road since Structure.” In , 1990:3–13. Philosophy of Science Association.
Kuhn, Thomas S. 2012. The Structure of Scientific Revolutions: 50th Anniversary Edition. 4th edition. Chicago ; London: University of Chicago Press.
Li, Xuhong, Haoyi Xiong, Xingjian Li, Xuanyu Wu, Xiao Zhang, Ji Liu, Jiang Bian, and Dejing Dou. 2022. “Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond.” Knowledge and Information Systems, 1–38.
Linden, Stanton J. 2003. The Alchemy Reader: From Hermes Trismegistus to Isaac Newton. Cambridge University Press.
McCarthy, John. 2008. “The Philosophy of AI and the AI of Philosophy.” In Philosophy of Information, 711–40. Elsevier.
Ngo, Richard. 2022. “The Alignment Problem from a Deep Learning Perspective.” ArXiv Preprint ArXiv:2209.00626.
Pinker, Steven. 2003. The Blank Slate: The Modern Denial of Human Nature. Penguin.
Popper, Karl. 2002. The Logic of Scientific Discovery. 2nd edition. London: Routledge.
Reisch, George A. 2016. “Aristotle in the Cold War: On the Origins of Thomas Kuhn’s the Structure of Scientific Revolutions.” Kuhn’s Structure of Scientific Revolutions at Fifty: Reflections on a Science Classic, 12–30.
Rorty, Richard. 2003. “Dismantling Truth: Solidarity versus Objectivity.” The Theory of Knowledge: Classical and Contemporary Readings,.
Russell, Stuart. 2019. Human Compatible: Artificial Intelligence and the Problem of Control. Penguin Books.
Russell, Stuart, and Peter Norvig. 2009. Artificial Intelligence: A Modern Approach. 3rd edition. Upper Saddle River: Pearson.
Sandberg, Anders. 2013. “An Overview of Models of Technological Singularity.” The Transhumanist Reader: Classical and Contemporary Essays on the Science, Technology, and Philosophy of the Human Future, 376–94.
Searle, John. 1980. “Minds, Brains, and Programs.” Behavioral and Brain Sciences 3 (3): 417–57.
Sejnowski, Terrence J. 2018. The Deep Learning Revolution. Illustrated edition. Cambridge, Massachusetts: The MIT Press.
Taylor, Jessica, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch. 2016. “Alignment for Advanced Machine Learning Systems.” In Ethics of Artificial Intelligence, 342–82. Oxford University Press.
Thomas Kuhn. 2014. “What Are Scientific Revolutions?” In Philosophy, Science, and History, 71–88. Routledge.
Tomasik, Brian. 2011. “Risks of Astronomical Future Suffering.” Foundational Research Institute: Berlin, Germany.
Vančik, Hrvoj. 2021. “Alchemy.” In Philosophy of Chemistry, 39–45. Springer.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30.
Wei, Jason, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, and Donald Metzler. 2022. “Emergent Abilities of Large Language Models.” ArXiv Preprint ArXiv:2206.07682.
Yudkowsky, Eliezer. 2008. “Artificial Intelligence as a Positive and Negative Factor in Global Risk.” In Global Catastrophic Risks. Vol. 1. Oxford University Press
Emerging Paradigms: The Case of Artificial Intelligence Safety
Note: This is a paper I wrote as part of a philosophy of science graduate seminar. At the moment, I consider it a work in progress and welcome constructive feedback.
1. Introduction
This paper closely examines the nascent field of artificial intelligence (AI) safety as it is currently evolving into a paradigm of its own in computer science. This subfield of AI is concerned with one central question: how to make machine learning models have goals aligned with human goals. This puzzle is widely known as the alignment problem (Bostrom 2016) or the problem of control (Russell 2019). The paper is centered around the proposal that one requirement to solve the alignment problem is to reach the stage of normal science in the new paradigm of AI safety.
To support this proposal, I will offer an account of how the research efforts to transform the field into a paradigm accurately reflect Kuhn’s model of the structure of science presented in the Structure of Scientific Revolutions and what this specifically predicts for AI safety. I will first describe the characteristics of the field at its pre-paradigmatic state and dive into the nature of the alignment problem and how it should be understood in relation to questions about how science changes. Then, I will suggest the desiderata for AI safety to qualify as a paradigmatic field according to Kuhn’s epistemology of science. Further, the development of AI safety is tightly associated with the discussion about existential risks from technological progress. I will use Kuhn’s approach to the history of science to discuss the role of philosophy of science in the transformation of the field into a paradigm to minimize existential risks. My overarching claim is that history and philosophy of science help provide a more comprehensive understanding of risks from advanced AI models and implementing Kuhn’s view on the development of AI safety is crucial for constructing an epistemological account of such risks and challenges.
2. AI safety: a field at the pre-paradigmatic state
The opportunity to observe an emerging scientific field transform into a paradigm is accompanied by intrinsic uncertainty and metascientific deliberation. Uncertainty, because the field lacks the maturity required to be studied as a discipline of its own. Maturity here entails unity, clarity, and concreteness in the research agendas of scientists (Hibbert 2016). In Kuhn’s terms, at this stage, researchers have no consensus on the body of concepts, phenomena, and techniques to use when working. They also do not follow the same rules or standards for their scientific practice, simply because no such rules or standards have been established. Establishing them is no easy task; “the road to a firm research consensus is extraordinarily arduous” (Kuhn 2012, 15). Metascientific deliberation is necessary to account for the uniqueness of the generation of novel scientific knowledge and practice as well as the difficulties that might be encountered on the road to normal science and puzzle-solving (Kuhn 2012).
Diagnosing the difficulties of the pre-paradigmatic phase will help illustrate how Kuhn’s model applies to AI safety. The first difficulty concerns narrowing down the research directions space. Without a paradigm, or at least one obvious paradigm candidate, all hypotheses, research proposals, and agendas seem equally promising and worthy of investigation. Moreover, we observe a disunity of frameworks that concerns the concepts, theories, agendas, practices, methodological tools, and other criteria for what qualifies as having high explanatory power. As of 2022, there are three main research directions in AI safety, but it is highly controversial whether any of them would qualify as a paradigm independently. These research directions are 1) Assistance Games, motivated in (Russell 2019) and developed in (Fickinger et al. 2020), 2) Agent Foundations (Yudkowsky 2008), and 3) Prosaic Alignment including three proposals: HCH (Christiano 2018), ELK (Christiano 2022), and formalizing heuristic arguments (Christiano, Neyman, and Xu 2022). This categorization is not exhaustive although it provides a sense of the alignment research landscape and the priorities of prominent researchers. Different teams and laboratories are nevertheless optimizing for different agendas and goals as there is no agreement on which agenda is most likely to succeed.
A second, follow-up difficulty is that most researchers tend to be fundamentally confused about their field. This feature of the pre-paradigmatic stage is explicitly expressed and acknowledged by alignment researchers working with different agendas. For example, Dr. John Wentworth indicates that the first step of his research plan is to “sort out our fundamental confusions about agency” (Wentworth 2021) admitting that experts in the field share a “fundamental confusion” and there is no explicit consensus on the nature of the subject matter and the best ways to approach it.
One last but crucial difficulty for the pre-paradigmatic AI safety is the lack of a published corpus of work that goes together with a highly organized discipline. As Kuhn notes: “in the sciences, the formation of specialized journals, the foundation of specialists’ societies, and the claim for a special place in the curriculum have usually been associated with a group’s first reception of a single paradigm” (Kuhn 2012, 19). In other words, the institutional norms and patterns are yet to be established. This includes having a rigid definition of the group of AI safety scientists, one or more AI safety peer-reviewed journals, regular conferences, university courses[1] and professorships.
There are many signs that these difficulties will be overcome in the near-term future. For example, the fact that many students and researchers have shifted their academic interests towards machine learning alignment is a way to measure the growth of the field into a scientific community. According to an estimation, there are currently approximately 400 people worldwide working directly on AI safety issues, including those that do not do strictly scientific work but are adjacent to the developing paradigm of research.[2] Having these researchers pursue the existing directions, conduct empirical work and experimentation to falsify hypotheses, and share their findings indicates that the field is at least receiving adequate attention to eventually become its own paradigm.
Before diving into the nature of the alignment problem, it is valuable to address why it is so critical to consider the development of the AI safety paradigm in particular, among all fields that might be under transformation at this point. The answer is that AI safety entering its normal science phase will have an extraordinary impact on society at large (Taylor et al. 2016). This is primarily because whether we are able to control extremely powerful machines or transformative AI systems (TAI) might be the main factor determining the long-term future of humanity. To understand the significance of a fully formed field of AI safety, we must take into account that the stakes of solving the alignment problem are particularly high as machine learning is becoming more and more a general purpose technology, i.e., “a single generic technology, recognizable as such over its whole lifetime, that initially has much scope for improvement and eventually comes to be widely used, to have many uses, and to have many spillover effects” (Crafts 2021). The risks that accompany the potential arrival of a highly powerful AI fall into two general categories: existential risks and suffering risks (Bostrom 2016). AI being an existential risk means that highly powerful future models will be unaligned with human goals and values and consequently, they could cause the species to go extinct (Ibid). The hypothesis concerning suffering risks suggests that such unaligned models could be the cause of endless suffering either for humans or other sentient beings (Tomasik 2011). All these should be carefully considered when examining under what conditions the field of AI safety is being transformed into a paradigm of its own.
3. The nature of the alignment problem
Examining the nature of the alignment problem yields useful observations for the metascientific study of this emerging field and for placing it into the context of the discourse on risks and technological progress. To analyze the alignment problem from a philosophical standpoint, it is helpful to consider whether it shares any similarities with other problems in the history of science. To explicate the metaphysics of alignment, I draw an analogy between AI safety and the history of chemistry. In particular, the alchemists were intellectually curious, “proto-scientists” (Vančik 2021), deeply confused about their methods and how likely they are to succeed. They all, however, shared a common aim summarized in this threefold: to find the Stone of Knowledge (also called “The Philosophers’ Stone”), to discover the medium of Eternal Youth and Health, and to discover the transmutation of metals (Linden 2003). Their “science” had the shape of a pre-paradigmatic field that would eventually transform into the natural science of chemistry. Importantly, their agenda ceased to be grounded upon mystical investigations as the field began to mature. As Kuhn’s model predicts, those who remained attached to the pre-paradigmatic mysticism did not get to partake in the newly formed normal science; “the transfer of allegiance from paradigm to paradigm is a conversion experience that cannot be forced”, Kuhn remarks (Kuhn 2012, 150) suggesting that a transformation takes place facilitating the change of practice from pre-scientific to scientific.
The claim here is not that alignment maintains in any sense the mystical substrate of alchemy. It shares the high uncertainty combined with attempts to work at the experimental and observational level that cannot be supported as in physical sciences. Furthermore, it shares the intention to find something that does not yet exist with the expectation that when it does, it will make the human world substantially qualitatively different than it was prior to that invention. This belief in a technological product that will change the flow of history in extraordinary and perhaps unpredictable ways is worth remarking. It also allows to deepen the chemistry-alignment analogy: the aim of alignment work is to make highly powerful systems follow instructions by human designers. But, at the same time, AI teams seem to be in a “technology arms race” (Armstrong, Bostrom, and Shulman 2016) to be the first to build the most powerful machine that has ever existed and could have transformative impact on humanity (Gruetzemacher and Whittlestone 2022).
It remains valuable to ask about what can history of science teach us when studying the generation of a new scientific field. Continuing with the analogy, it would be useful for the progress of alignment research to be able to trace what exactly happened when alchemy became chemistry. Several suggestions might apply, namely the articulation of one or more equations, or the discovery and analysis of a substance like phlogiston. In that sense, researchers would need to find alignment’s phlogiston and that would bring them closer to discovering alignment’s oxygen. Of course, scientific change is not as straightforward. While ideally, we would want to uncover what the successful actors in history did and apply it to a paradigm in the making, in practice, this does not hold. This would become possible, however, with a generalized logic of scientific discovery that would describe in high detail how science operates.
4. Questions about scientific change
Another way to think about the generation of this new field is to consider whether the transformation from the pre-paradigmatic to the paradigmatic stage could be accelerated. This question simply translates into whether progress can become faster and if yes, under what conditions. It seems that throughout the history of science, faster progress has always entailed quantitatively more empirical work in the field. Because of the rapid development of AI, one suggestion is to use AI models to advance alignment research. Using AI models could increase research productivity in alignment even if they cannot at present generate novel insights or properly theorize about scientific work in a reliable way. Such AI systems are usually large language models and for example, they can review bibliographical material, compose summaries, and explain concepts with less jargon, among other tasks (Wei et al. 2022).
In Kuhn’s framework, progress in science is discontinuous. This suggests that researchers working within the established (future) paradigm of alignment theory will see the world radically differently compared to researchers working in the paradigm of “good old-fashioned AI” (Haugeland 1989). It is worth noticing that there were a few researchers in the old paradigm thinking that highly powerful machines will pose a problem of control, notably, I. J. Good and V. Vinge. Whether they could have predicted, however, the occurrence of the deep learning revolution (Sejnowski 2018) and what it would imply for alignment is rather debatable.
Contrary to the traditional, logical empiricist view, it is not the accumulation of new knowledge that will bring the new paradigm. Researchers will need to recognize that the anomalies and confusions of the field at the pre-paradigmatic state require different conceptual and technical tools to be resolved. However, it is generally possible to understand the problematic aspects of the field, in retrospect, i.e., once the paradigm is established.
This brings us to a commonly misunderstood concept in Kuhn’s epistemology of science, i.e., incommensurability. Incommensurability invites a twofold discussion about the way terms are used when placed in different theoretical contexts and about changes in worldview (Kuhn 2012; 1990). A classic example is the term “mass”; while the term appears both in the paradigm of classical mechanics and relativity theory, it connotes different meanings depending on the paradigm. But while some concepts can be translated from an older paradigm to the vocabulary of the new one, it is generally impossible to find a one-to-one correspondence between the terms of the old paradigm and the terms of the new one. This semantic dependence on the theory implies that there is no common measure although a comparison remains possible (Kuhn 1990). In that sense, as AI safety continues to evolve, we may find that concepts such as agency or intelligence are incommensurable to what was used to describe artificially intelligent systems in the old AI paradigm. This will not suggest that there is no way to neutrally compare e.g., the concept of intelligence without anchoring it into a theoretical apparatus. It is also possible that some terms preserve their meanings despite the paradigm change (Kuhn 1990, 36).
The change of scientific worldview is extensively discussed by Kuhn in the Structure and by Hanson in Patterns of Discovery with the proposal of theory-ladenness. Kuhn parallelizes the change of worldview to how we suddenly see different shapes in gestalt figures (Kuhn 2012, 114). The verb “see” here implies the automaticity of this change of perception, i.e., the scientist cannot control the shift of attention that has occurred. A typical example of this in the history of science is astronomy. Notably, while western Europeans started discovering planets the half century after the introduction of the Copernican paradigm, the Chinese were long aware of many more stars in the sky. This is because their cosmological paradigm offered a different model of celestial change.
Kuhn explains how he originally conceived the idea of belonging to a different paradigm. He describes sitting at his desk with Aristotle’s Physics open in front of him when “suddenly the fragments in my head sorted themselves out in a new way, and fell into place together […] for all at once Aristotle seemed a very good physicist indeed, but of a sort I’d never dreamed possible” (Kuhn 2014). From that point onwards, Kuhn would argue that scientists do not try to do the job of their predecessors simply in a better way than they did (Reisch 2016); they see a different world, they attend to different phenomena and features of reality. This leads to an asymmetry in that they operate within a completely unique context and thus construct different world models and provide different explanations.
On a similar tone but with a cognitive focus, Hanson describes the phenomenon of theory-ladenness to illustrate that scientific understanding depends on perceptual input and its processing. Hanson famously quotes Duhem’s example:
Enter a laboratory; approach the table crowded with an assortment of apparatus, an electric cell, silk-covered copper wire, small cups of mercury, spools, a mirror mounted on an iron bar; the experimenter is inserting into small openings the metal ends of ebony-headed pins; the iron oscillates, and the mirror attached to it throws a luminous band upon a celluloid scale; the forward-backward motion of this spot enables the physicist to observe the minute oscillations of the iron bar. But ask him what he is doing. Will he answer ‘I am studying the oscillations of an iron bar which carries a mirror’? No, he will say that he is measuring the electric resistance of the spools. If you are astonished, if you ask him what his words mean, what relation they have with the phenomena he has been observing and which you have noted at the same time as he, he will answer that your question requires a long explanation and that you should take a course in electricity (Hanson 1958, 16-17).
Both from Kuhn and Hanson, it becomes clear that participating in a paradigm means to learn to attend to certain phenomena and features of the parts of the world that are under investigation. At the same time, this implies learning to block out everything else; part of belonging to a paradigm means that scientists agree on what to attend to and what to block out.
Throughout his writings, Kuhn challenges the idea of scientific progress, for example by saying: “We must explain why science — our surest example of sound knowledge — progresses as it does, and we must first find out how in fact it does progress” (Kuhn 1970, 20). Science does not “get better” It is plausible to argue that Kuhn would be skeptical of a potential technological singularity (Sandberg 2013) or the completion of science. In Kuhn’s model, science does not move towards absolute knowledge or complete science. Since the scientific process is discontinuous, it is not meaningful to argue that science aims at “objective” truth or simply truth as correspondence to reality (Rorty 2003). Paradigms are incommensurable, therefore there is no neutral way to talk about how one paradigm finds “more truth” than the other. One might assume that Kuhn purposefully talks very little about the notion of truth in the Structure (Bird 2012). He returns to the topic in the Postscript to the second edition mostly to address the various criticisms against the Structure about relativism (Kuhn 2012, 204). While assessing Kuhn’s understanding of truth is out of the scope of this paper, it is necessary to note that Kuhn seems to be at least a neutralist stance about truth (Bird 2007). This has, in many instances made him a major ally of antirealist views in science while it is not implausible to find ways in which the Kuhnian model is compatible with scientific realism.
5. Desiderata for the new paradigm
The fact that people can form coalitional groups and agree to conform to certain rules and norms does not make their group nor their work a scientific paradigm. It is thus essential to sketch out what we can expect AI safety to look like as a paradigm, based on Kuhn’s model of the structure of science. This can be regarded as experimenting with a future history of AI safety. As such, it is useful to lay out a series of desiderata that AI safety will qualify for once it has entered its paradigmatic stage. First, while the term paradigm has a long history that dates back to Aristotle’s Rhetoric (Hacking 2012), paradigms are defined in the Structure as “accepted examples of actual scientific practice—examples which include law, theory, application, and instrumentation together—[that] provide models from which spring particular coherent traditions of scientific research” (Kuhn 2012, 11). In that sense, the paradigm will shape scientific work as a whole, from education and recruiting new researchers to everyday practice.
It follows then that the function of a paradigm is both cognitive and normative (Kindi and Arabatzis 2013, 91) which means that paradigmatic AI safety will 1) prepare students for becoming professional researchers and members of a scientific community, 2) select the class of facts that “are worth determining both with more precision and in a larger variety of situations” a process characteristic of normal science also called “fact gathering” (Kuhn 2012, 25-27), 3) define the specific problems that must be solved (Ibid, 27-28), 4) offer criteria for selecting these problems (Ibid, 37), 5) guarantee the existence of stable solutions to these problems (Ibid, 28), 6) provide methods and standards of solutions, 7) aim at quantitative laws (Ibid). 8) make predictions, and 9) give satisfactory explanations for phenomena.
It is worth remarking that the paradigm offers what Kuhn calls “theoretical commitment” without which important facets of scientific activity do not exist, namely the discovery of laws. Kuhn’s examples highlight that the examination of measurements alone is never enough for finding quantitative laws. The history of science provides sufficient evidence to think that the experimental, Baconian method does apply, but only to a certain degree. Characteristically, Kuhn mentions Boyle’s Law, Coulomb’s Law, and Joule’s formula (Ibid, 28) to argue that they were discovered through the application of a particular scientific paradigm – even if it was not as explicit at the time – and not through the study of measurements. In that sense, the theoretical commitment encapsulates a set of assumptions and concepts which are a prerequisite for the discovery of laws. By analogy, we can anticipate AI safety to find and formulate laws once the theoretical commitment is clearly established and broadly accepted.
6. Metascientific projects
6.1. Conceptual clarification
In setting up a metascience of alignment, the primary question to consider is how can philosophy help the field in a concrete and straightforward way. While the possibility of this is itself contentious and pertains to an inquiry about the relationship of philosophy and science more broadly, it is arguable that philosophy can, to some extent, assist in disambiguating central concepts of AI. In the case of AI safety in particular, there are many conceptual difficulties that stem from the intrinsically perplexing nature of the study of intelligence and agency. For example, the research agenda of Agent Foundations is focused on modeling agents so that future powerful models exhibit behaviors we can predict and consequently, control. Philosophers working on the alignment problem can contribute to the conceptual clarification and modeling of agency using analytic tools that are common in philosophical reasoning.
There is one important objection to the claim in favor of the contribution of philosophical conceptual clarification. Looking at the history of science more generally, it does not seem like philosophy typically helped directly with scientific research. In other words, while some work in the philosophy of science engages with specific problems in the sciences, it is rare that it actually accelerates the progress of the field by disambiguating fundamental concepts beyond merely semantic disagreements. For that to be a true possibility, philosophers would have to work as scientists and scientists would have to cultivate a philosophical disposition.
A reply to this objection is that throughout the history of science and across various disciplines, scientists have in many instances acknowledged the necessary role of philosophical thinking in articulating the right questions and trying to explain the world rationally. Concepts are often understood as the “building blocks” of theories (Gerring 1999). In that sense, as Popper notes in the preface to the Logic of Scientific Discovery, the method of philosophy and science is a single one, and that is rational discussion (Popper 2002, xix). More specifically on the relationship of AI and philosophy, McCarthy hints at the need for conceptual clarification aided by working on philosophical questions. This is especially useful since both fields are concerned with concepts such as goals, knowledge, belief, etc. A central problem at the basis of AI concerns the conditions that will allow testing intelligent behavior for example, by applying the Turing test. Analyzing what the Turing test is about requires philosophical rigor so that to determine and interpret the criteria to pass the test. For that reason, it has been a common theme in philosophy e.g., in (Searle 1980), (Dennett 1984), and (Chalmers 1994). It overall appears that conceptual clarification is indispensable to rational discourse and the more thoroughly that it is executed, the more likely it is to yield itself conducive to paradigm transformation.
6.2. Questions about the nature of machine learning
Another project in the philosophy of science adjacent to the development of the AI safety paradigm focuses on questions about the metaphysics of machine learning and specifically, the nature of deep neural networks. This discussion is centered around whether we can emulate the functions of human minds by imitating the neural circuitry of the human brain. Until recently, this seemed unlikely to work. Steven Pinker criticized connectionism in the early 2000s arguing that
Humans don’t just loosely associate things that resemble each other, or things that tend to occur together. They have combinatorial minds that entertain propositions about what is true of what, and about who did that to whom, when and where and why. And that requires a computational architecture that is more sophisticated that the uniform tangle of neurons used in generic connectionist networks. (Pinker 2003, 79)
Pinker goes on to expose the limitations of connectionism that derive from this not sufficiently sophisticated computational architecture of neural networks. All this is to show that a complete human thought cannot be represented in a generic network that is used in machine learning. Specifically, neural networks would not be able to distinguish between kinds and individuals, have thoughts that aren’t just a summation of smaller parts (compositionality), play with logical quantification, embed one thought in another (recursion), and reason categorically (Pinker 2003).
The most recent development of machine learning proved all these claims false. Deep neural networks are surprisingly good at all the tasks Pinker mentions. In particular, large language models or transformers (Vaswani et al. 2017) and other deep learning architectures have generated impressive results such as writing like humans or making scientific discoveries, e.g., AlphaFold (Jumper et al. 2021). While there is no consensus on why these models are so successful (Ngo 2022), it is reasonable to conclude that intelligence was not so hard to find, after all. How they work seems to be itself a problem between philosophy, science, and engineering. One or more solutions to it will likely be at the foundation of the new AI safety paradigm. Explainable AI, i.e., models that are not regarded as black boxes but are rather interpretable(Li et al. 2022) would overall make safety easier to achieve.
Lastly, considerations about whether it is possible for machine learning systems to think or become highly competent at human-level tasks go back to AI researchers in the 1950s (McCarthy 2008) and remain pertinent and present even in AI textbooks such as in (Russell and Norvig 2009). Gaining clarity on how neural networks work and what accounts exactly for their success is almost equivalent to unlocking the nature and conditions of cognition.
7. Conclusion
The emergence of AI safety as a paradigm in the making, fuels the study of how new science is generated and presents an opportunity to dive into the most fundamental questions about the nature of scientific practice and the epistemology of progress in science and technology. My aim in this paper was to explore the application of Kuhn’s theory of the structure of science in the case of AI safety. I argued that the field is at its pre-paradigmatic state which motivates a series of metascientific observations. For these observations to make sense, I offered an account of the nature of the alignment problem, the central problem AI safety must tackle. In doing so, I suggested that there is an analogy between the development of AI safety and the history of chemistry. I then sketched out the requirements for the field to transform into a fully evolved paradigm according to Kuhn’s view on scientific change. As the filed is undergoing quick development, it is reasonable to speculate that it will transition to the paradigmatic phase in the near-term future. A potential way to accelerate this transformation is by investing in philosophically-minded, metascientific projects such as conceptual clarification and the theoretical study of machine learning. If targeted at the right questions, they could be conducive to transitioning to a paradigm of AI safety. The need to reach the paradigmatic state becomes more and more urgent as machine learning capabilities advance, and models exhibit intelligent behavior that could eventually pose serious risks for society at large.
References
Armstrong, Stuart, Nick Bostrom, and Carl Shulman. 2016. “Racing to the Precipice: A Model of Artificial Intelligence Development.” AI & SOCIETY 31 (2): 201–6. https://doi.org/10.1007/s00146-015-0590-y.
Bird, Alexander. 2007. “What Is Scientific Progress?” Noûs 41 (1): 64–89.
———. 2012. “10 Kuhn, Naturalism, and the Social Study of Science.” In Kuhn’s the Structure of Scientific Revolutions Revisited, edited by Vasō Kintē and Theodore Arabatzis, 205. Routledge.
Bostrom, Nick. 2016. Superintelligence: Paths, Dangers, Strategies. Reprint edition. Oxford, United Kingdom ; New York, NY: Oxford University Press.
Chalmers, David J. 1994. “On Implementing a Computation.” Minds and Machines 4 (4): 391–402.
Christiano, Paul. 2018. “Humans Consulting HCH.” Medium. April 15, 2018. https://ai-alignment.com/humans-consulting-hch-f893f6051455.
———. 2022. “Eliciting Latent Knowledge.” Medium. February 25, 2022. https://ai-alignment.com/eliciting-latent-knowledge-f977478608fc.
Christiano, Paul, Eric Neyman, and Mark Xu. 2022. “Formalizing the Presumption of Independence.” ArXiv Preprint ArXiv:2211.06738.
Crafts, Nicholas. 2021. “Artificial Intelligence as a General-Purpose Technology: An Historical Perspective.” Oxford Review of Economic Policy 37 (3): 521–36. https://doi.org/10.1093/oxrep/grab012.
Dennett, Daniel C. 1984. “Can Machines Think?” In How We Know, edited by M. G. Shafto. Harper & Row.
Fickinger, Arnaud, Simon Zhuang, Dylan Hadfield-Menell, and Stuart Russell. 2020. “Multi-Principal Assistance Games.” ArXiv Preprint ArXiv:2007.09540.
Gerring, John. 1999. “What Makes a Concept Good? A Criterial Framework for Understanding Concept Formation in the Social Sciences.” Polity 31 (3): 357–93.
Gruetzemacher, Ross, and Jess Whittlestone. 2022. “The Transformative Potential of Artificial Intelligence.” Futures 135: 102884.
Hacking, Ian. 2012. “Introductory Essay.”
Hanson, Norwood Russell. 1958. Patterns of Discovery: An Inquiry into the Conceptual Foundations of Science. 1st edition. Cambridge: Cambridge University Press.
Haugeland, John. 1989. Artificial Intelligence: The Very Idea. Reprint edition. Cambridge, Mass: Bradford Books.
Hibbert, Ruth. 2016. “What Is an Immature Science?” International Studies in the Philosophy of Science 30 (1): 1–17. https://doi.org/10.1080/02698595.2016.1240433.
Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, and Anna Potapenko. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89.
Kindi, Vasso, and Theodore Arabatzis. 2013. Kuhn’s The Structure of Scientific Revolutions Revisited. Routledge.
Kuhn, Thomas S. 1970. “Logic of Discovery or Psychology.” Criticism and the Growth of Knowledge 4: 1.
———. 1990. “The Road since Structure.” In , 1990:3–13. Philosophy of Science Association.
Kuhn, Thomas S. 2012. The Structure of Scientific Revolutions: 50th Anniversary Edition. 4th edition. Chicago ; London: University of Chicago Press.
Li, Xuhong, Haoyi Xiong, Xingjian Li, Xuanyu Wu, Xiao Zhang, Ji Liu, Jiang Bian, and Dejing Dou. 2022. “Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond.” Knowledge and Information Systems, 1–38.
Linden, Stanton J. 2003. The Alchemy Reader: From Hermes Trismegistus to Isaac Newton. Cambridge University Press.
McCarthy, John. 2008. “The Philosophy of AI and the AI of Philosophy.” In Philosophy of Information, 711–40. Elsevier.
Ngo, Richard. 2022. “The Alignment Problem from a Deep Learning Perspective.” ArXiv Preprint ArXiv:2209.00626.
Pinker, Steven. 2003. The Blank Slate: The Modern Denial of Human Nature. Penguin.
Popper, Karl. 2002. The Logic of Scientific Discovery. 2nd edition. London: Routledge.
Reisch, George A. 2016. “Aristotle in the Cold War: On the Origins of Thomas Kuhn’s the Structure of Scientific Revolutions.” Kuhn’s Structure of Scientific Revolutions at Fifty: Reflections on a Science Classic, 12–30.
Rorty, Richard. 2003. “Dismantling Truth: Solidarity versus Objectivity.” The Theory of Knowledge: Classical and Contemporary Readings,.
Russell, Stuart. 2019. Human Compatible: Artificial Intelligence and the Problem of Control. Penguin Books.
Russell, Stuart, and Peter Norvig. 2009. Artificial Intelligence: A Modern Approach. 3rd edition. Upper Saddle River: Pearson.
Sandberg, Anders. 2013. “An Overview of Models of Technological Singularity.” The Transhumanist Reader: Classical and Contemporary Essays on the Science, Technology, and Philosophy of the Human Future, 376–94.
Searle, John. 1980. “Minds, Brains, and Programs.” Behavioral and Brain Sciences 3 (3): 417–57.
Sejnowski, Terrence J. 2018. The Deep Learning Revolution. Illustrated edition. Cambridge, Massachusetts: The MIT Press.
Taylor, Jessica, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch. 2016. “Alignment for Advanced Machine Learning Systems.” In Ethics of Artificial Intelligence, 342–82. Oxford University Press.
Thomas Kuhn. 2014. “What Are Scientific Revolutions?” In Philosophy, Science, and History, 71–88. Routledge.
Tomasik, Brian. 2011. “Risks of Astronomical Future Suffering.” Foundational Research Institute: Berlin, Germany.
Vančik, Hrvoj. 2021. “Alchemy.” In Philosophy of Chemistry, 39–45. Springer.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30.
Wei, Jason, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, and Donald Metzler. 2022. “Emergent Abilities of Large Language Models.” ArXiv Preprint ArXiv:2206.07682.
Wentworth, John. 2021. “The Plan.” LessWrong, December. https://www.lesswrong.com/posts/3L46WGauGpr7nYubu/the-plan.
Yudkowsky, Eliezer. 2008. “Artificial Intelligence as a Positive and Negative Factor in Global Risk.” In Global Catastrophic Risks. Vol. 1. Oxford University Press
[1] Standford University recently introduced an “Introduction to AI Alignment” course https://explorecourses.stanford.edu/search?view=catalog&academicYear=&page=0&q=STS&filter-departmentcode-STS=on&filter-coursestatus-Active=on&filter-term-Autumn=on and an “Advanced AI Alignment” course for spring 2023 https://explorecourses.stanford.edu/search?view=catalog&filter-coursestatus-Active=on&page=0&catalog=&q=STS+20SI%3A+Advanced+AI+Alignment&collapse= .
[2] According to https://80000hours.org/problem-profiles/artificial-intelligence/.